A Statistical Model for Word Discovery in Transcribed Speech
نویسنده
چکیده
English speech lacks the acoustic analog of blank spaces that people are accustomed to seeing between words in written text. Discovering words in continuous spoken speech then is an interesting problem that has been treated at length in the literature. The issue is particularly prominent in the parsing of written text in languages that do not explicitly include spaces between words, and in the domain of child language acquisition if we assume that children start out with little or no knowledge of the inventory of words the language possesses. While it is undoubtedly the case that although speech lacks explicit demarcation of word boundaries, it nevertheless possesses significant other cues for word discovery, it is still a matter of interest to see exactly how much can be achieved without the incorporation of these other cues, that is, we are interested in the performance of a bare-bones language model. For example, there is much evidence that stress patterns (Jusczyk, Cutler, and Redanz, 1993; Cutler and Carter, 1987) and phonotactics of speech (Mattys and Jusczyk, 1999) are of considerable aid in word discovery. But a bare-bones
منابع مشابه
A statistical model for word discovery in child directed speech
A statistical model for segmentation and word discovery in child directed speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described and results of empirical tests showing that the algorithm is competitive with other models that have been used for similar tasks are also presented.
متن کاملPhonological Mean Length of Utterance in 48-60-Month-old Persian-speaking Children with Isfahani Accent: Comparison of Story Generation and Conversation Samples
Objective:Phonological Mean Length of Utterance (PMLU), a quantitative measure for assessment of phonological skills, has been considered in developmental studies as a diagnostic and clinical criterion in phonological development. Moreover, it is an indicator rate of the efficacy of the intervention. The PMLU is a word level measure that can be calculated on the child’s transcribed speech sampl...
متن کاملThe Grammatical Abilities in a Speech Sample of Persian-speaking Children With Cochlear Implants
Objectives: Studies reported that children with Cochlear Implant (Cl) presented difficulties in grammatical acquisition. The Persian language is inflectional. The present study aimed to compare word-level inflections in the language samples of CI recipients and healthy-hearing children. Methods: Thirty children were recruited in this descriptive-analytical cross-sectional study. The Language S...
متن کاملProsodic features for a maximum entropy language model
A statistical language model attempts to characterise the patterns present in a natural language as a probability distribution defined over word sequences. Typically, they are trained using word co-occurrence statistics from a large sample of text. In some language modelling applications, such as automatic speech recognition (ASR), the availability of acoustic data provides an additional source...
متن کاملEdit Detection and Parsing for Transcribed Speech
We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on edited words of 2.2%. (The NULL-model, which marks everything as not edited, has an error rate of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 27 شماره
صفحات -
تاریخ انتشار 2001